New word acquisition using subword modeling

نویسندگان

Ghinwa F. Choueiter

Stephanie Seneff

James R. Glass

چکیده

In this paper, we use subword modeling to learn the pronunciations and spellings of new words. The subwords are generated with a context-free grammar, and are intermediate units between phonemes and syllables. We first evaluate the effectiveness of the subword model in automatically generating the spelling and pronunciation of new words. Then the subword model is embedded in a multi-stage recognizer which consists of word, subword, and letter recognizers. In a preliminary set of experiments, the hybrid system outperforms a large-vocabulary isolated word recognizer. The subword model is also used to improve the performance of the letter recognizer by generating a spelling cohort which is used to train a small letter n-gram. The small letter n-gram has a reduced perplexity compared to a much larger n-gram, and can be used by the letter recognizer for the spoken spelling mode. This could translate to an improved letter error rate in future letter recognition experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reversible Sound-to-Letter/Letter-to-Sound Modeling Based on Syllable Structure

This paper describes a new grapheme-tophoneme framework, based on a combination of formal linguistic and statistical methods. A context-free grammar is used to parse words into their underlying syllable structure, and a set of subword “spellneme” units encoding both phonemic and graphemic information can be automatically derived from the parsed words. A statistical -gram model can then be train...

متن کامل

Improved Subword Modeling for WFST-Based Speech Recognition

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexic...

متن کامل

The use of subword linguistic modeling for multiple tasks in speech recognition

Over the past several years, I have been conducting research on subword modeling in speech recognition. The research is most specifically aimed at the difficult task of identifying and characterizing unknown words, although the proposed framework also has utility in other recognition tasks such as phonological and prosodic modeling. The approach exploits the linguistic substructure of words by ...

متن کامل

Data-driven pronunciation modeling for ASR using acoustic subword units

We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per ...

متن کامل

Comparison of whole word and subword modeling techniques for speaker verification with limited training data

In this paper we use whole word and subword hidden Markov models for text dependent speaker veri cation. In this application usually only a small amount of training data is available for each model. In order to cope with this limitation we propose a intermediate functional representation of the training data allowing the robust initialization of the models. This new approach is tested with two ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

New word acquisition using subword modeling

نویسندگان

چکیده

منابع مشابه

Reversible Sound-to-Letter/Letter-to-Sound Modeling Based on Syllable Structure

Improved Subword Modeling for WFST-Based Speech Recognition

The use of subword linguistic modeling for multiple tasks in speech recognition

Data-driven pronunciation modeling for ASR using acoustic subword units

Comparison of whole word and subword modeling techniques for speaker verification with limited training data

عنوان ژورنال:

اشتراک گذاری